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FOREWORD 


This report summarizes research performed on the PREPS project 
during the first year. The project has supported three half-time 
graduate students (2 Ph. D. and 1 M.E.) and three principal investi- 
gators (Professor G. Cook was on leave from 1 September 1979 through 
3 May 1980) at 10% effort. 
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SECTION 1 


INTRODUCTION 

This research is directed at trying to capitalize on the recent 
developments in microprocessors and arithmetic support chips to design 
a reconfigurable emulator for real time flight simulation. The system 
consists of (1) a Master Control System (MCS) to perform all man/ 
machine interactions and to configure the hardware to emulate a given 
aircraft, and (2) numerous Slave Compute Modules (SCMs) which com- 
prise the parallel computational units. 

Three specific problem areas are reported on herein. First, early 
work has shown that all parts of the state equations can be worked on 
simultaneously but that the algebraic equations cannot (unless they are 
slowly varying). Thus, Section 2 reports on attempts to obtain new 
algorithms that will allow parallel updates. 

The second area involves determining the word length and step 
size to be used in the SCMs. This work is described in Section 3 as 
well as [1]. 

Finally, the architecture of the hardware and software is covered 
in Section 4 and in [2] . The architecture must clearly reflect the 
outcome of the above studies as well as conform to the conceptual 
framework of PREPS. 


1 


SECTION 2 


ALGORITHMS FOR PARALLEL PROCESSING 
2.1 Introduction 

It is a general rule that "A more accurate method is more time- 
consuming," but in a real-time simulation time is very critical. The 
advent of cheap and fast microprocessors suggests that we can solve 
problems by using multiprocessors in parallel. 

Now the question is, can we use multiprocessor systems to solve 
numerical differential equations such as 

x = f(x,t) (2-1) 

Looking at some well-known numerical integration methods, such as the 
Euler method 

x n+1 = x n * T, < x n>' <2 ' 2) 

or the AB-2 method 

x n+1 = x n * |t 3f(x n ) * f(x n-1 )] ' (2 ' 3) 

we see that we cannot obtain x n+1 before obtaining the value of f(x n ), 
the two tasks have to be done in series. Unfortunately, to update 
f(x ) is time-consuming even for moderate complexity of f(x). We need 
new algorithms that can update x and f(x) at the same time. If we 
consider only numerical integration algorithms in the form 

x n+1 = x n * T t f(x n-1>' f(x n-2>' ‘ • ' • 1 (2 ' 4) 

we can calculate f(x ) and x n+1 almost simultaneously. That is, they 

can be done in parallel. In the following section we develop several al- 
gorithms. We also have determined their corresponding stability regions 
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and tried to enlarge them. The idea appears to be very promising as a 
means of exploiting parallel processing in digital simulation. 

2.2 Development 

The general form of k step multistep integration algorithms is 

x n+1 ^JVi * Wil < 2 ' 5 > 

i=k 

where n, k are integers k < n. After setting a^ = 1 , 0 n = 0 and aj = 
0 for k < i < n, we can develop several new algorithms by the coeffi- 
cient-matching method for different values of k. For stability consi- 
derations we only take k = 2,3. 

Assuming k = 2, we have the basic form 



x n+1 x n + T ^1 f n-1 + P 2 f n-2^ 

(2-6) 

x n+3 = X n+2 + T[P 1 x n+1 + P 2 X n ] ' since X = f(x ' t} ' 
Step 1: Using the Taylor Series Expansion, we have 

(2-6) 

X n+3 

= x + 3Tx " + x n + 

n n 2 

(2-7) 

X n+2 

= x + 2Tx + < 2 ? ?x" + 111 + ... 

n n 2! n 3! n 

(2-8) 

T h x n+1 = 

T ^1 x n * T ^1 x n + T 3 pi 2? x n* + 

(2-9) 


T P 2 x n = T(S 2 x n 

(2-10) 


Step 2: Use coefficient-matching method to minimize the local 

truncation error. 
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The minimum local truncation error occurs for P-j = 2 ancl P 2 
and is 


3 

2 


E = — t 3 x'" + O(T^) 
t LT 12 1 X n UU 


and 


w n+1 


= x n + l> t 5f n-1 - 3 W 


( 2 - 11 ) 


(2-12) 


By the same procedures, we can get another algorithm for k = 3, i.3.. 


T 


x n+l = x n + -T2[ 53f n-1 ' 64f n-2 + 23t n- 3 ] (2 ' 13) 


and 


E = 55 jV + 0 (t 5) 
b LT 24 n u u 


(2-14) 


These algorithms are listed together with two other non-minimum local 
truncation error algorithms in Table 2-1. 

Now these algorithms can be implemented in parallel, because when 
we update x p+ ^ , we only need to utilize f ^ for which the update was 
begun just after getting Although this cannot be done completely 

in parallel, it gives us a hope of the further development. 


2.3 Stability 

Stability is another improtant criterion in evaluating the usefulness 
of a numerical integration algorithm. Any acceptable algorithm must be 
at least weakly stable, usually we need strong stability. The stability 
regions of the above algorithms, compared with AB methods, are shown 
in Figure 2.1. 
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Table 2-1. Truncation Errors of Several Algorithms 
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Re(Xh) 





Based on accuracy considerations, P4 is the algorithm that we are 
interested in, but it has a quite small stability region. It means that if 

s 

we want to use it, we must pick a step size sufficiently small to force 
the value of Ah to fall into the stability region and this really limits its 
usefulness. 

A new question arises: "If we change the basic form into 
x n+1 = Ax n + Bx n-1 + Cx n-2 + Dx n-3 + T[Ef n-1 + Ff n-2 + Gf n-3 ] (2 " 15) 


could we change the situation?" After further investigation, we found 
we could enlarge the stability region a bit, but not much (Detail see 
Appendix in Section 2.5). This algorithm is therefore confined to 
solving differential equations with small A if a large h is to be used. 

2.4 Summary 

In the previous section we have developed several new numerical 
integration algorithms. These new algorithms make it possible for us to 
update x n+ -j and f ^ simultaneously. Thus the algorithm can be imple- 
mented via a set of fast and low cost parallel processors. The stability 
regions have been studied and plotted. In all cases the range of 
stability was reduced by the use of stale data. Nevertheless the stable 
range was still usually large enough to accomodate a value for T of 
one-fifth the time constant (T/2 = .2 or AT = .2) which is as large as 
one normally wishes to use. 
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2.5 Appendix 


THEOREM: 

A necessary condition for stability of the linear multistep method 

a. y + a. .y ,, .. + ....+ a y = T{pk^n+k + ^k-1^n+k-1 + + ^o f n* 

k r n+k k-1 r n+k-1 o 7 n 

(2-16) 


is that no root of the associated polynomial 

P(6) = a R S k + + •••• + “o (2-17) 


may have modules which exceed 1, and that the roots of modulus 1 be 
simple. 

Since the general form is 

x +1 = Ax + Bx - + Cx ' + Dx ~ + T[Ef . + Ff ~ + Gf „] 
n+1 n n-1 n-2 n-3 n-1 n-2 n-3 

(2-18) 

where A, B, C, D, E, F, G are coefficients to be determined and T is 
step size, the first constraint, due to the theorem, is that the poly- 
nomial 

P(S) = 6 4 - A6 3 - C6 - D (2-19) 

has no roots with modulus exceeding 1 and that those of modulus 1 be 
simple. 

Using the maximum coefficient-matching method, we can solve for 
E, F, G in terms of A, B, C, D as follows: 


E = | 2 [81 - 28A - 5B - D] 

(2-20) 

F = |[2A - 2B + 2D - 18] 

(2-21) 

9 1 1 5 

G = l ~ 3 A + 12 B + T2 D 

(2-22) 
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and local truncation error also depends on A, B, C, D, 

e lt = 4 ■ I A - h c)r \ * 0(t5) (2 ‘ 23) 

Now all we have to do is to decide the possible values for A, B, 
C, D to meet the constraint. We put one root at 0, another at 1, just 
as all the other numerical methods do, and let the other two roots , 
§2 move around within a unit circle. The problem can be divided into 
two cases: 

Case 1 : 6^ , 6 2 are on the real axis, i . e. , 6^ = , 8 ^ - k 2 anc * 

k i| < 

Equation (2-19) becomes 

6 4 - (k^^+DS 3 + (k 1 k 2 +k 1 +k 2 )6 2 + = 0, (2-24) 

and 

A = (k^+k 2 +1 ) 

B = -(k^ k 2 +k 1 +k 2 ) 

C = k-|k 2 

D = 0 

The local truncation error is 

E j_-p ~ 24 “ 8 (k-j+kg) " 24^1^2^' (2-29) 

Case 2 : 6^ , 6 2 are complex conjugate, i.e., 6^ = a+jui, 6 2 = a ~j tJU 

and 

la +uj 2 < 1 . 


(2-25) 

(2-26) 

(2-27) 

(2-28) 


< 1 
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Equation (2-19) becomes 


6 ^ -(2a+1)6^ + (a^+iu^+2a)6^ + (a^ + u)^) = 0 

(2-30) 

and 

A = 2a+1 

(2-31) 

2 ? 

B = -(a +u> +2a) 

(2-32) 

2 2 
C = U Mif 

(2-33) 

O 

II 

Q 

(2-34) 

The local truncation error is 

c _ 55 3 1 , 2^ 2. 

E LT 24 ' 4 “ " 24 (a ^ 

(2-35) 


From Figures 2-2 to 2-10, the stability regions for different values 
of 61 and 62, we can see that for most cases the stability region is larger than 
that of algorithm P4 and the stability regions get bigger and bigger 
when 6-j and/or 62 approach the boundary of the unit circle. But we 
know all the existing stable multistep methods have 6^ and 8 ^ at the 
origin, and when 6^ and 62 approach the boundary of the unit circle, 
we would expect the algorithm to be less stable or less accurate. We 
need to do further investigation. 

Assume x = -Ax where A < 0. Then the difference equation 
x n+ i " Ax n - [p-ATE x n _ 1 ] - [C-ATF]x n _ 2 + ATGx^ = 0 (2-36) 

has a solution of the form 

x p = C^lj* + C 2 P" + C3P3 + C 4 pJ. (2-37) 

As T -»■ 0, p 1 = 1, P 2 = 0, P 3 = P 4 = k 2 - Here p^ represents the 
true root, and P2, P3 and p 4 are extraneous roots which arise only 
because we have replaced a first order differential equation by a fourth- 
order difference equation. For a strong stable algorithm, we have to 
let p 2 < 1, P 3 < 1, P 4 < 1. 
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Figure 2-2.' Stability regions for ki = ko on 



axis 




Figure 2-5. Stability regions for kn = 0.5, k 2 on the Re(z) axis 




Figure 2-6. Stability regions for k^ = 0.75, ^ on the Re (z) axis 






Stability region for ki = 0.25 + jo», k 2 = 0.25 










Figure 2-10. Stability region for = 0.75 



From Tables 2-2 to 2-7, which are constructed by varying the 
value of AT, = -0.27 is the upper bound for stability and accuracy 
consideration, because at this value some algorithms have one root 
greater than unity and others lose accuracy by introducing imaginary 
parts. If we want a little greater stability to suit a large AT, there 
are two choices: 

(i) x n+1 = ! x n - 1 x n-1 + ls[ 189f n-r 256f n- 2+ 87f n-3l (2 - 38> 


or 


211 _4 IV . n/T 5. 
E LT = W T x n + 0(T > 


x n+1 2x n " 2 x n-1 + 2 x n-2 * 24^ 65f n-l‘ 88f n-2 +35f n-3] 


E = 51 t 4 x IV + 0(T 5) 
h LT 48 x n UU 


(2-39) 

(2-40) 

(2-41) 


We need to perform simulations to further judge their usefulness. 
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Table 2-2a. Root Locations 


XT = -0.05 

■a 

k 2 

61 

32 

33 

LM 

70 

0 

0.951215 

0.355454 

-0.15335 + jO. 50983 

■ 

0.25 

-0.25 

0.951214 

0.401046 

-0.17613 + jO. 469974 

■ 

0.25 

0 

0.95121 

0.432307 

-0.06676 + j 0.46468 

■ 

0.25 

0.25 

0.951204 

0.48603 

-0.03138 + jO. 42803 

Hi 

0.25 

0.5 

0.951199 

0.587492 

0.10565 + jO. 36254 

1 

0.25 

0.75 

0.951124 

0.765601 

0.14164 + jO. 28612 

33 

0.5 

-0.5 

0.951209 

0.542992 

-0.2471 + jO. 35567 


0.5 

-0.25 

0.951206 

0.550895 

-0.12605 + jO. 39755 


0.5 

0 

0.951199 

0.563707 

-0.00745 + jO. 39905 


0.5 

0.5 

0.951169 

0.641303 

0.203764+ jO. 28235 


0.5 

0.75 

0.951152 

0.775251 

0.26180 + jO. 15489 


0.75 

-0.5 

0.951185 

0.757975 

-0.22958 + jO. 27415 


0.75 

-0.25 

0.95118 

0.759343 

-0.10526 + jO. 32903 


0.75 

0 

0.951171 

0.76532 

-0.01865 + jO. 33224 


0.75 

0.75 

0.951036 

0.816661 

0.59838 + jO. 133921 



Note: £3 means complex conjugate of 63. 

x means the absolute value of 8i > 1. 


Table 2- 2b. Root Locations 


XT = -0.05 

mm 

U) 

3l 

32 

83 

84 

0 

0 

0.95121 

0.35545 

-0.15334 + jO. 50983 


0 

0 

to 

U1 

0.95122 

0.310305 

-0.13213 + jO. 55091 


0.25 

0.25 

0.95121 

0.40513 

-0.07183 + jO. 46384 

63 

0.25 

0.5 

0.95121 

0.22257 

0,16311 + jO. 60885 


0.5 

0.25 

0.95119 

0.42949 

0.30966 + jO. 29071 


0.5 

0.5 

0.95120 

0.15311 

0.44784 + jO. 54781 


0.75 

0.25 

0.95117 

0.11223 

0.71831 + jO. 25441 
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Table 2- 3a. Root Locations 


XT = -0.1 

*1 

k 2 

01 

02 

S3 

04 

0 

0 

0.904584 

0.426622 

0.16560 + jO. 68500 

■ 

0.25 

-0.25 

0.904565 

0.462158 

-0.18336 + jO. 65276 

■ 

0.25 

0 

0.904503 

0.494706 

-0.07460 + jO. 63206 

■ 

0.25 

0.25 

0.904394 

0.545743 

0.02493 + jO. 58689 

■ 

0.25 

0.5 

0.904164 

0.632818 

0.10651 + j 0.51691 

■ 

0.25 

0.75 

0.90306 

0.784402 

0.15627 + jO. 43027 

II 

0.5 

-0.5 

0.904471 

0.576176 

-0.24032 + jO. 56038 

e 3 

0.5 

-0.25 

0.904409 

0.58748 

-0.12094 + jO. 57312 


0.5 

0 

0.904319 

0.604521 

-0.00442 + j 0.55899 


0.5 

0.5 

0.903739 

0.686548 

0.20486 + jO. 44321 


0.5 

0.75 

0.901951 

0.801461 

0.27329 + jO. 14908 


0.75 

-0.5 

0.903895 

0.768846 

-0.21137 + jO. 46971 


0.75 

-0.25 

0.903738 

0.771819 

-0.08778 + jO. 48934 


0.75 

0 

0.903464 

0.776442 

0.03505 + jO. 47692 


0.75 

0.75 

0.895464 

0.855484 

0.37456 + jO. 14908 



Table 2- 3b. Root Locations 


XT = -0.10 

k 

0) 

01 

02 

. .. 

03 

04 

0 

0 

0.904584 

0.426622 

0.16560 + jO. 68500 


0 

0.25 

0.904604 

0.393114 

-0.14886 + jO. 71788 


0.25 

0.25 

0.90445 

0.487811 

0.05387 + jO. 61801 


0.25 

0.5 

0.904566 

0.340651 

0.12739 + jO. 72782 

¥ 

0.5 

0.25 

0.904074 

0.571984 

0.26197 + jO. 46520 


0.5 

0.5 

0.904434 

0.297086 

0.39924 + jO. 61916 


0.75 

0.25 

0.902965 

0.274497 

0.66127 + jO. 25067 
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Table 2- 4a. Root Locations 




AT = 

-0.15 


mm 

— 

h 

32 

3 3 

34 

0 

0 

0.359341 

0.476703 

-0.16802 + jO. 82072 


0.25 

-0.25 

0.859208 

0.507451 

-0.18333 + jO. 79220 


0.25 

0 

0.858838 

0.540661 

-0.07475 + jO. 76153 


0.25 

0.25 

0.858135 

0.590594 

0.025635+ jO. 70952 


0.25 

0.5 

0.856351 

0.671962 

0.11084 + jO. 63495 


0.25 

0.75 

0.831331 + jO. 0105599 

3l 

0.168669+ j 0.541913 


0.5 

-0.5 

0.858591 

0.606184 

-0. 232687+’ jo. 709719 


0.5 

-0.25 

0.858204 

0.620489 

0.11435 + j 0.70740 

3 3 

0.5 

0 

0.857574 

0.640453 

0.00099 + jO. 68305 


0.5 

0.5 

0.85037 

0.72805 

0.20944 + jO. 55980 


0.5 

0.75 

0.841861 + jO. 0308764 

h 

0.28314 + jO. 456348 


0.75 

-0.5 

0.852558 

0.788329 

-0.19544 + jO. 61098 


0.75 

-0.25 

0.850571 

0.79468 

-0.07263 + jO. 61425 


0.75 

0 

0.846422 

0.805377 

0.04910 + jO. 59209 


0.75 

0.75 

0.867969 + jO. 045016 

3l 

0.38203 + jO. 31819 

1 


Table 2-4b. Root Locations 




At = 

-0.15 



k 

0) 

3i 

3 2 

6 3 

34 

0 

0 

0.859341 

0.476703 

-0.16802 + j 0.82072 


0 

0.25 

0.859451 

0.447507 

-0.15348 + jO. 84966 


0.25 

0.25 

0.858543 

0.542269 

0.04959 + jO. 73797 


0.25 

0.5 

0.859228 

0.415615 

0.11258 + jO. 83304 

S3 

0.5 

0.25 

0.855972 

0.637901 

0.25306 + jO. 58384 


0.5 

0.5 

0.858475 

0.405967 

0.36778 + jO. 70171 


0.75 

0.25 

0.847451 

0.614993 

0.51878 + jO. 29603 
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Table 2-5a. Root Locations 


XT = -0.20 


*2 

A 

& 2 

e 3 

B 4 

0 

0.813932 

0.520066 

-0.11700 + jO. 93686 

■ 

-0.25 

0.813359 

0.548211 

-0,18079 + j 0.91069 

■ 

0 

0.811803 

0.582872 

-0.07234 + jO. 87228 

m 

0.25 

0.808502 

0.634411 

0.02854 + jO. 81442 

■ 

0.5 

-.795513 

0.722697 

0.11590 + jO. 73553 

■ 

-0.5 

0.810375 

0.640033 

-0.22520 + jO. 83450 


-0.25 

0.808471 

0.656843 

-0.10766 + jO. 82156 


0 

0.804923 

0.681446 

0.00682 + j 0.78921 


0.5 

0.785482 + jO. 040399 

6l 

0.21452 + jO. 657418 


-0.5 

0.806713 + jO. 0390617 

h 

-0,18171 + jO. 729575 


-0.25 

0.809695 + jO. 0424832 

Bl 

-0.05969 + jO. 72174 


0 

0.814069 + jO. 0466179 

6 1 

1 

0.06093 + jO. 69197 



Table 2-5b. Root Locations 


XT = -0.2 


0) 

h 

e 2 

h 

e 4 

0 

0.813932 

0.520066 

-0.16670 + jO. 93686 

■ 

0.25 

0.814379 

0.493282 

-0.15383 + jO. 96332 

■ 

0.25 

0.810591 

0.590268 

0.94857 + jO. 84099 

■ 

0.5 

| 0.813614 

) 

0.474643 

0.10587 + j 0.92739 

03 

0.25 

; 0.794696 

0.703101 

0.25110 + jO. 68157 


0.5 

0.810778 

0.489458 

0.34988 + jO. 782657 


0.25 

0.777819 + jO. 068809 

h 

0.472181+ jO. 428535 
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Table 2-6a. Root Locations 


1 . v 


XT = 

-0,25 



B 

k 2 

0 i 

r 

e 2 

03 

04 

0 

0 

0.764619 

0.564556 

X 

X 

0.25 

-0.25 

0.762269 

0.592382 

X 

X 

0.25 

0 

0.755181 

0.632886 

-0.06903 + jO. 97124 


0.25 

1 0.25 

0.717892 + jO. 0155761 

eT 

0.03211 + jO. 90816 


0.25 

0.5 

0.7542 + jO. 0637311 

el 

0.1208 + jO. 82517 

63 

0.5 

-0.5 

0.740026 

0.69649 

-0.218258 + jO. 944256 


0.5 

-0.25 

0.726368 + j 0.0279864 

el 

-0.10137 + jO. 92315 


0.5 

0 

0.737616 + jO. 047944 

h 

0.01238 + jO. 88403 



Table 2-6b. Root Locations 




XT = 

-0.25 



k 

0 ) 

Pi 

02 

&3 

0 4 

0 


0.764619 

0.564556 

X 

X 

0 


0.766375 

0.538369 

X 

X 

0,25 


0.748818 

0.64882 

0.05118 + jO. 93331 

01 

0.25 


0.76406 

0.530168 

x 

X 

0.5 


0.748561 + jO. 068962 

h 

0.25144 + jO. 76737 


0.5 


0,753959 

0.567169 

0.33944 + jO. 85870 

1 

j 

63 
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Table 2-la. Root Locations 




XT = 

-0.27 



*1 

k 2 

0 i 

82 

83 

84 

0.25 

0 

0.716738 

0.66852 

X 

■ 

0.27 

0 

0.701398 

0.690966 

X 

X 

0.28 

0 

0.69691 + jO. 00906 


-0.05691 + j 0.99862 


0.3 

0 

0.69988 + jO. 0227805 

8 i 

-0.0498798 + jO. 99199 

8j 


Table 2- 7b. Root Locations 




XT = 

-0.27 



k 

w 

01 

02 

' 

03 

04 

0.25 

0.25 

0.697969 + jO. 0265897 

h 

0.05203 + jO. 96797 

■ 

0.5 

0.5 

0.721195 

0.605773 

0.33652 + j0.8877 


0.5 

| 

0.4 

0.70184 + jO. 056508 


0.29816 + jO. 84226 
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SECTION 3 

COMPUTER WORD SIZE REQUIRED FOR 
SPECIFIED ERROR IN NUMERICAL INTEGRATION 

3.1 Introduction 


This section deals with computational errors; a question that has 
to be examined in the light of the particular characteristics of 
microcomputers, mainly their short word length. Specifically, the 
accumulation of round off errors in numerical integration performed 
by small word computers will be investigated. This choice of effort 
was based on what seems to be a reasonable assumption that con- 
siderations on errors in numerical integration will be decisive in 
determining the minimum word length required for satisfactory sim- 
ulation. Also, since one of the main (and looked for) consequences 
of using parallel processing on a large scale will be the possibility 
to lower the sampling period of all signals and in particular the step 
size of all integrations, the discretization (or truncation) error will 
no longer be a source of great concern; a fact which dramatically 
emphasizes the new and major role played by the accumulation of 
round off error in the envisaged solution. Indeed, in many cases 
the use of an extremely unsophisticated integration algorithm like 
the Euler's method would be perfectly sufficient to make the dis- 
cretization error smaller than the accumulated round off error. 

The effectiveness of a partial use of double precision arithmetic 
will be shown in this presentation and the validity of including in the 
arithmetic an algorithm for symmetric rounding will be discussed. 

Two simple criteria will be established, in a particular but typical 
one -dimensional problem, that relate the accumulation of round off 
errors and the word length of the computer. However, no final, 
definitive, recommendation will be given regarding the minimum word 
length to be used for a given maximum error in the simulation of a 
complex problem (that is a problem involving a order differential 
equation, possibly including non-linearities). In such a case a math- 
ematical approach to the question of the accumulation of round off 
errors becomes very quickly hopeless, and no other solution has been 
found than a comparative simulation (several simulations with different 
word length) of each particular system. Nevertheless the conclusion 
that has emerged at the end of this study is that in many instances 
short word computers will be adequate for simulation purposes and 
that it will be possible to obtain valid results using extremely short 
step sizes. 
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Assuming that the dynamic system to be emulated can be divided 
into blocs of simultaneous differential equations, the simulation 
problem can be summarized, using state space representation, as the 
search for the real time solution of 

— = x' = u) (.3-1) 

dt 

where x is the state vector and u is the input vector. This vector 
equation can also be written as 

Xj - f ^(x^, Xg, • • • , u^, Ug, • • • , 

x 2 = f 2^ x l’ x 2’ * * * * x d* u l* u 2* * * ’ * u d^ ( ) 


x^ fd ^ x i » Xg* • • • , x^, u^, Ug, • • • » 

In the projected multi-microcomputer emulator each one of those d 
equations in (2) would be integrated by its own dedicated computer. 

Thus, for each individual integrator the problem can be reduced to the 
integration of one differential equation 

x) = f(x L , m) (3-3) 

where the forcing term u^ now represents the combination of all terms 
in x^ and Uj in the i*h equation of (2), except for those terms involving 
the dependent variable x^. 

In this study, attention will be given first to analysing the numeri- 
cal integration of (3) alone, where u^ is assumed exactly known, in 
order to focus on the errors introduced by the integration process 
itself. However, the preceeding assumption cannot be made when the 
equation is part of a system of simultaneous equations because of the 
dependence of the forcing term on the other state variables. In a second 
step consideration will be given to the system of equations as a whole. 


28 



3.2 Euler's Method 


Euler's method to integrate (3), where the subscripts can now be 
forgotten, is given by the algorithm 


x 


n+1 


x „ + T f( V u a> 


(3-4) 


x 


0 



where x Q is the initial condition and T is the step size of integration. 
However, this formulation does not accurately represent the relation- 
ship between the successive results of the numerical integration 
because it does not take into account the rounding of numbers in the 
various operations needed to calculate x n+ j. Assuming that the 
computer uses floating point arithmetic, a more accurate represen- 
tation of that relationship is 




n+1 


-K + [T«x 0 . u „»*} 



(3-5) 


where the symbol a is defined as the computed value of the function 
or the variable a and a* indicates its rounded value (to one computer 
word). Formulation (5) assumes that 

1) no overflow occurs at any point of the computation 

2) the step size of integration T is chosen such that it can be 
exactly expressed by a one word number. 

A more manageable form of equation (5) can be obtained by 
explicitly expressing the error 

*n+l = x n + T f(x n , u n ) + e n+1 (3-6) 

where e n+ , is defined as the local round off error [1}. It can be 
evaluated by 

e n+l = { x n + ^ x n* u n^] ' ” f x n + ^ ^ x n* u n^J 

= {x n + (Tfo n .u a )I«}* -{S„ + (Tf(x n ,S n )]*j 

+ [Tf(. . . )]* - Tf(. . . ) + Tf(. . . ) - Tf(. ..) (3-7) 
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3.3 The Local Round Off Error 


The way in which round off errors are generated in the process 
of performing a particular arithmetic operation depends greatly upon 
the implementation of the arithmetic and the type of computer used 
(there is a difference for example if the computer uses a simple or a 
double accumulator [2]). However, generally speaking, there are two 
ways to round a number to a N-bit mantissa floating point expression 
where the first bit is reserved for the sign [3], [4] : 

1) by simple truncation (or chopping), then all bits beyond the N 
most significant ones are simply dropped 

2) by symmetric rounding, then a special algorithm is required : 
first the (N+l)th bit of the mantissa is added to the Nth one, 
then only the N most significant bits of the result are retained. 

The N-bit mantissa floating point representation of any number y 
can therefore be written as 

y* = y( 1 - e ) (3-11) 


where 


Oie < 2 -N+1 

if y is simply truncated to form a N-bit mantissa floating point 
number, and 


if y is symmetrically rounded. 


The local relative round off error e cannot be known a priori for 
every number that will be used during the integration process except 
for its bounds. Therefore, the relative local round off error will be 
viewed as a random variable with uniform distribution. Moreover, it 
will be assumed that all those random variables corresponding to 
successive steps of integration are independent. Then the laws about 
summing a large number of independent random variables will be 
applicable. 
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When no special rounding algorithm is included in the arithmetic, 

W computer will naturally round the results of all operations by 
■iply truncating them. But, because of the two's complement form 
Wid to represent negative numbers, it will actually truncate their 
■ solute values [1]. Therefore, in these conditions the only safe 
F/pothesis that can be formulated about the statistical characteristics 
f the local round off error are 

FM H*nl ’ y* 2~ (N+1) ( 3-12a) 

var(e n ) = <f 2 x 2 , <T 2 = 2' 2N /12 (3-12b) 

When an algorithm performing symmetric rounding is included in the 
arithmetic, it then can be assumed that 

E(e n ) * 0 (3-13a) 

var(e n ) = CT 2 x 2 , (T 2 * 2 _2N /12 (3-13b) 
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3.4 The Accumulated Round Off Error 



} * 


Substracting (4) from (6), and defining 


r = x - x 
n n n 


as the accumulated round off error at the n*“ step of integration yields 


n+1 


= r„ + T [ f(x„. uj - f(x n . u n ) ] + e 


n 


n n 


n+1 


This equation describes the evolution of the accumulated round off 
error. It can be approximated by 


r = r + T [ ) r + (— ) (u - u )1 + e 

n+1 n 1 dx n n du n n n ,J 


n+1 


(3-14) 


In order to focus more specifically on the errors introduced by the 
integration process itself, it will now be assumed that 


u_ - u„ = 0 
n n 


so that the accumulated round off error becomes 


r . , = r +Tgr + e , . 
n+1 n & n n n+1 


(3-15) 


where 


,df v 

g n dx n 


Equations (14) and (15) relate the accumulated round off error to the 
local round off error and to the parameters of the original differential 
equation. Looking at the expression found for the local round off error 
(7) as well as at those two last equations, it clearly appears that the 
accumulated round off error depends on all the aspects of the original 
simulation problem in a rather compex manner. To approach it 
mathematically will therefore often be hopeless. However, Henrici 
[1] has established some theoretical results which will now be 
introduced without proof. 
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He has shown that assuming 

< PPn (3-16) 

var(e n ) = <T 2 q n (.3-17) 

where p n and q n are two continuous, nonnegative, piecewise smooth 
functions of t = nT , the statistical characteristics of the accumulated 
round off error are given by 

l E(r n ) |< > ¥ (m n + ° (T)) (3-18) 

var(r n ) = ^(v n + O(T)) (3-19) 

where m n and v n are two functions of t = nT obtained by solving the 
following differential equations 

m'(t) = g(t)m(t) + p(t) (3-20) 

m(0) = 0 


and 


where 


v'(t) = 2g(t)v(t) + q(t) 
v(0) = 0 


^n 


= («) 
ax /n 


(3-21) 


This result indicates that the expected value and the variance of the 
accumulated round off error, which will be assumed to have a Gaussian 
distribution, are directly proportional to the expected value and the 
variance of the relative local round off error. It also emphasizes that 
the statistical characteristics of the accumulated round off error will 
be inversely proportional to the step size of integration T , a fact that 
apparently prohibits using in the same time short word computers and 
small step sizes. Partial double precision will change this fact. 
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Example 

Consider a simple first order system with a unit step input 

x'(t) = -Lx(t) + 1 (3-22) 

x(0) = 0 

The exact solution to this problem is given by 

x(t) = |(1 - e" Lt ) . (3-23) 

Lj 

It is represented by a broken line in figure 1, for L = 1 and L = 0. 1 . 

If the numerical integration were perfect, its results would correspond 
to points on those curves. Actually when the simulation is performed 
with a 9 -bit mantissa computer (N = 8), for example, the curves 
obtained are quite different as shown on the figure. The output of the 
integrator becomes constant at a level much below the limit toward 
which it is supposed to tend. This shows that there is a minimum rate 
of change of the variable to integrate under which the numerical 
integration becomes inoperative. What happens can be expressed by 

*n+l = Pn + [ T f( V “n’ 1 )* = 


or 


-NT 

T f(x n , u n ) 2 iN (exponantial part of x n ) 

x' < 2 ^ (exponantial part of x n ) 
n T 

When T and N are small this phenomena may generate very large 
errors. For example, setting L = 1. in (22) and integrating (22) with 
a 9-bit mantissa computer will introduce a steady state error of 
about 100% when T = 1/128 sec. and an error of about 300% when 
T = 1/512 sec. 
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Fig. 3-1. Integration of x' = -Lx + 1 with a computer having a 9-bit 

mantissa (including the sign bit). The exact solution to this differential 
equation is indicated with a broken line. 
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3.5 Partial Double Precision 


As previously mentioned (10), in most cases of real time simula- 
tion the induced error forms the largest part of the local round off 
error. To eliminate it would therefore result in a substantial gain in 
accuracy, particularly when short word computers are used. This 
can precisely be achieved at low cost (a very small increase in 
computation time) by performing the addition on the right side of (4) 
in double precision. Henrici [1] refers to this algorithm as "partial 
double precision". 

Using partial double precision, the equation expressing the real 
results of the numerical integration, taking the effect of rounding into 
consideration, becomes 

*n+l = {*n + T " (5 n'“n } ) # (3 ~ 25) 

where the following definitions apply : 

1) x is the double precision computed value of the quantity x 

2) x* is the value of x after it has been rounded to form a single 
precision quantity 

3) y^ is the double precision rounded value of the quantity y 

4) f is the single precision computed value of the function f. 

It is here emphasized that although x n will be obtained step after step 
as a double precision variable, only its single precision value is used 
to compute the function f(. . . ). Therefore, this last computation which 
in many instances will be the most time consuming especially when 
the function is highly non-linear, will not require any additional time. 
Then, if it is assumed that the function f(. . . ) can be calculated 
accurately enough so that the only error degrading its value is its 
final rounding to a single precision number 

f(x*, u*) = f*(x*, u*) 
n n n n 


and (25) can be somewhat simplified 


k n+l 


■ {> 


n 


+ Tf 


•(x*. 


(3-26) 


Once again, equation (26) may be rewritten in a more manageable form 
as 
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Tf(x , u) 
ri n 


+ e 


n+1 


( 3 - 27 ) 


where the local round off error is explicitly expressed. Equation (27) 
appears to be formally identical to equation (6), however two major 
differences should be noted : 

1) x n+ j is now a double precision value, and this alone suggests a 
substantial gain in accuracy 

2) e n+ ^ is very different than in. equation (6), and will actually 
absorb part of the gain. 


The local round off error can now be estimated. As before, in 
order to focus on the errors introduced by the integration process 
itself, it will be assumed that the forcing term is known with an 
absolute accuracy. Then 


And 


u 


n 


= u 


n 


u* 


e n+l * + T %>}* - K + T Vl 

♦ Tf*(x*.u n > - Tf(x*.u n ) 

+ T f(x», u n ) - T f(x Q , u n ) 


= {...}* -{...} + T [ f*(. . . ) -«...)]■ 


+ Tg n (5S - x n ) 


( 3 - 28 ) 


where 


g 


df(. . . ) 


n 


dx 


Following the same line of reasoning as before, the quantities 


(x^ - x) 
n n 


and 


lf*(...)- - f( > ] 
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will be viewed as two random processes. Specifically it will be 
assumed that 

Nj ‘ < K'l x nl 

when numbers are simply truncated, or, 

E(x* - x) = 0 
n n 

when numbers are symmetrically rounded, and 

var(x* - x ) = 

nan 

in both cases. In the same way, it will also be assumed that 
|E[f*(. ..) - f( — )j| < p'|f(x n ,u n )| 
when numbers are simply truncated, or 

E[f*(. . . ) - f(...)] = 0 

when numbers are symmetrically rounded, and 

var[f*(. . . ) - f(. . . )] = u^) 

in both cases. For all those expressions 

p\ _ 2~(N+1) 

< V 2 = 2”^ N+1 Vl2 

The statistical characteristics of the local round off error, 
simple truncation is used, can then be found by introducing (29) 
into (28), so that 

+ T Kl f( V u n ) l + Tg nK'l x nl 


where 

jj _ 2“0V1+1) 


(.3- 2 9a) 
(3-29b) 
( 3-29c) 
(3-30a) 
( 3- 30b) 
( 3- 30c) 

( 3-31a) 

( 3-31b) 

when 
and (30) 

(3-32) 
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and M is the number of bits in the mantissa (not including the sign bit) 
of a double precision number. Also, applying the rules regarding the 
summation of independent random variables, it can be found that 

var(e n+1 ) = cr 2 xg + T 2 CT ,2 f 2 (x n . u n ) + T 2 g 2 (T 2 x 2 (3-33) 

where 


( T 2 = 2~ 2M /12 

When symmetric rounding is used for all computations the expected 
value of the local round off error is of course assumed to be 

E(e n+1 ) = 0 (3-34) 

while its variance is still given by (33). 

Now, assuming that y is very small compared to yj' and 0" is 
very small compared to QT' , the preceeding expressions can be 
somewhat simplified. Equation (32) becomes 

£ (e n+l>| < T H| f(x n» V| + gtK|) (3-35) 

while equation (33) now reads 

var(e n+1 ) = T 2 (T' 2 [ f 2 (. . . ) + g 2 x 2 ] (3-36) 

This last assumption could probably not be made in most cases where 
a large computer is used but it will be valid in many cases involving 
short word computers. For example if a 16-bit microcomputer has its 
word divided into 7 bits for the exponantial part and 9 bits for the 
mantissa, its double precision mantissa will have 25 bits and the first 
term in (32) and (33) will be negligible compared to others for most 
realizable T's. 

At this point, Henrici's formulas to evaluate the statistical para- 
meters of the accumulated round off error [see (18) and (19)] can be 
called upon to notice that because of the factor T present in the 
expected value of the local round off error (35), when partial double 
precision is used together with simple truncation, the expected value 
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of the accumulated round off error will be independent of the step size 
of integration and much smaller (very roughly of a factor 1/T ) than in 
the case where partial double precision was not used. In the same way, 
it appears that because of the factor present in the expression of 
the variance of the local round off error, the general effect of the 
accumulation of round off errors will decrease with the step size of 
integration when both partial double precision and symmetric rounding 
are used. Indeed, in this last case the expected value of the accumula- 
ted round off error would (at least theoretically) be zero, while its 
variance would be proportional to T. This is certainly the most inter- 
esting consequence of using partial double precision. It should be 
emphasized however that in order for this to be true the first terms 
in both (32) and (33) must always (for any n) be negligible compared 
to the other terms. If, for example, T were very small and as a 
consequence double precision would not be enough to fulfill this 
requirement, it could be of interest to program some kind of triple or 
even quadruple precision algorithm. To do so would not represent a 
very difficult problem since the corresponding extra-long variables 
would have to be used in an addition only. 

There is a second very positive consequence of using partial 
double precision : the minimum rate of change of the variable to 
integrate under which the integrator will not be able to follow properly 
is now given by 

_-M 

x' <-? (3-37) 

n ^ T 

Since M is considerably smaller than N, (37) represent a substantial 
improvement compared to (24) when a short word computer is used. 
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First Example 


Consider a simple linear first order system with unit step input to 
be simulated by a short word computer using partial double precision. 
The differential equation to integrate is 

x 1 = -Lx + 1 , x(0) = 0 (3-38) 

Its exact solution is 

x(t) = 1.(1 - e - ^) (3-39) 

L 


The expected value of the local round off error can be found using (35). 
It should be noted though that because the solution x n will always (for 
all n) be positive the absolute value signs in (35) may be removed and 
the inequality now becomes an equality. The quantities f(x n , u n ) and 
x n cannot be known exactly a priori, however assuming that the results 
of the numerical integration will closely follow the exact solution, those 
two quantities can be approximated using the exact theoretical solution 
(39). Then 


Therefore, 


or, 


1(1 - 

e -L„ T) 

( 3-40a) 

L 



V = 

e -LnT 

( 3-40b ) 

-L 


( 3-40c ) 


(35) gives 

E(e n+1 ) = -Tp'[2e' LnT - 1 ] 

E(e(t)) = -Tji'[2e" Lt - 1] (3-41) 


The expected value of the accumulated round off error can now be 
estimated using (18) and (20). Equation (20) becomes 

m'(t) = -Lra(t) - T[ 2 e -Lt - 1] 


m(0) = 0 
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Its solution is 


m(t) = 1[1 - (l+2Lt)e' U ] (3-42) 

L 

It then follows that 

E(r ) = £'[ 1 - (1 + 2LnT)e“ LnT ] (3-43) 

L 

In the same way, the variance of the local round off error can be 
approximated by 

var( e n + i) = T 2 CT* 2 [ e -2LnT + (1 -e" LriT ) 2 ] (.3-44) 

And the variance of the accumulated round off error can be estimated 
using (19) and (20) ; it has been found to be 

var(r ) = [ 1 - 4e‘ LnT + (3 + 4LnT) e _2LnT ] (.3-45) 

n 2L 

The next two figures represent the evolution of the expected value of the 
relative accumulated round off error, that is E(r n )/x n , and of the 
variance of the relative accumulated round off error, that is var(r n )/x n 
calculated according to (43) and (45). 

Experimentation relatively to this example has been conducted in 
the following way. Equation (38) has been simulated with two parallel 
integrations. For the first simulation the full length of the mantissa of 
the computer (a HP2100 using a 32-bit word with a 23-bit mantissa) was 
used. For the second integration only 9 bits of the mantissa were 
retained for calculations (that is the mantissa was truncated or rounded 
to a 9-bit expression after each arithmetic operation). However, in 
this second computation the full length of the mantissa was used for the 
"double precision" variables. The accumulated round off error 
corresponding to a 9-bit mantissa was then estimated as the difference 
between the results of those two integrations. Furthermore, in order 
to obtain an average value and an estimated variance, for each set of 
parameters, equation (38) was simulated 21 times giving for each pass 
a slightly different amplitude to the forcing term (the 21 different 
amplitudes were evenly spread between 0.95 and 1.05). Some of the 
typical curves obtained in the course of this experimentation are given 
following the figures representing the theoretical curves. 
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Fig. 3 t 2. Expected value of the relative accumulated round off error for 
the first example. Theoretical curve corresponding to equation (43). 



Fig. 3-3. Standard deviation of the relative accumulated round off error for 
the first example. Theoretical curve corresponding to equation (45). 
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Fig. 3-4. Relative average accumulated round off error corresonding 
to a real simulation of example 1 (9-bit mantissa). — 





Fig. 3-5. Relative average accumulated round off error corresponding 
to a real simulation of example 1 (9-bit mantissa). 


45 



RELATIVE STANDARD DEVIATION OF ACCUfl. 
[*.ooi ROUND OFF ERROR 


2.5C- tSINPLE TRUNCATION) 



Fig. 3-6. Relative standard deviation of the accumulated round off error 
corresponding to problem 1 (9-bit mantissa). 
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F^S* 3-7. Relative standard deviation of the accumulated round off error 
corresponding to problem 1 (9-bit mantissa). 
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Fig. i3-8. Relative average accumulated round off error corresponding to 
example 1 (9-bit mantissa). 



Fig. 3-9. Relative average accumulated round off error corresponding to 
example 1 (9-bit mantissa). 
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Second example 


Consider now a linear first order system with a sinusoidal input 
to be simulated using a short word computer and partial double precision. 
The steady state operation will be assumed in order to examine whether 
the accumulated round off error will also reach a steady state type of 
evolution, or on the contrary will grow without bounds. The differential 
equation to be integrated is 

x' = -Lx + Usinwt (3-46) 


Its steady state solution is 

x = A sin(wt +¥') (3-'47a) 

where 

Y* = arctan(-w/L) (3-27b) 

A = U/(LcosY > - wsinj?) (3-27c) 

Using (35), the local round off error is found to be 

|e ( e (t) )J Tji' [ |-LA sin(wt +<f) + U sin wt| 

- LA|sin(wt +^f )| ] 


or, simplifying, 

|E(e(t))|< UTp'|sinwt| (3-48) 

An approximation for (48) can be obtained by the first terms of its 
Fourier series. Then, 

E(e(t))|^ UTji'[j^ - Icos 2wt ] (3-49) 

Now the contribution of the two terms on the right side of (49) to the 
accumulated round off error can be examined separately. The constant 
term gives, using (20), 

m 0 (t) = 2UT(1 -e' Lt ) 

L 
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Therefore (18) becomes 


Ktv < ( 1 - 

and for t large enough 


e" Lt ) 


E(r ) < >*' 2U 

n 1 a v ^ XL 


The cosine term in (49) gives, using (20), 


( 3 - 50 ) 


(3-51) 


where 


m c (t) = Bcos(2wt +Y) 

(3-Sla) 

Y - arctan(-2w/L) 

(3-51b) 

B = 4U/ic(Lcosy / - 2wsinV / ) 

(3-51c) 


Formula (51) shows that the accumulated round off error will 
indeed reach a steady state type of evolution, except when L = 0, 
then it will grow without bounds. This result seems to emphasize 
very strongly the importance of the ratio U/L which will determine 
the value of the DC term of the expected value of the accumulated 
round off error. Formula (52) indicates that this last quantity will 
oscillate around its DC term, but also that the amplitude of the 
oscillation will be limited. Indeed, (52c) shows that 


- If w ^ L 

, then 

B - 2 | E(r 4>v 

- If w ~ L/2 


B “ Kit* 

- If w » L/2 


B «l E(r ntv 

The variance of the local round off error can be found for this 

example using (36). 




var(e(t)) = T 2 <T' 2 [ ( -LA sin(wt +V > ) + U sinwt ) 2 
+ L 2 A 2 sin 2 (wt +Y) } 
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Fig. 3-11. Standard deviation of the accumulated round off error corres- 
ponding to example 2. Two similar simulations with w=l. and w=. 1 
have yielded the same curves except for the ripple. 
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X' • -UX + $IN(U*TIrlE ; , U-.314 , T-l/128 


Fig. 3-13. Simulation corresponding to the second example 
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or, simplifying, 

var(e(t)) < T 2 Cf ,2 (U 2 + 2L 2 A 2 + 2LA ) (3-52) 

Then, the accumulated round off error is given by (21) and (19). It was 
found that 

v(t) ^ 1 ' e " 2Lt ) (3-53) 

2 i-j 

where 

C = (U 2 + 2L 2 A 2 + 2 LA ) (3-54) 

From (53) it can be deduced that the variance var(r n ) will be bounded. 
Indeed, (19) gives, for t large enough, 

var(r n ) m ^ (3-55) 

n ^ 2L 


Experimentation regarding this example has been conducted 
following a procedure similar to the one developed for the first example. 
Some of the most typical curves obtained are given in the following 
figures. It results from the various simulations performed that when 
numbers are simply truncated formulas (50) and (52) give a quite good 
description of the expected value of the accumulated round off error. 

It is clear that this quantity can become very large when L is very 
small. Simple truncation is therefore not advisable with short word 
computers. When symmetric rounding is used, the expected value of 
the accumulated round off error is very small, near zero, as antici- 
pated. Then, the most important characteristic of the accumulated 
round off error becomes its variance or standard deviation. Curves 
relative to this quantity are also given. They reveal that the variance 
of the accumulated round off error is badly described by the various 
expressions established so far. No satisfactory explanation has been 
found regarding this fact. However, the size of the standard deviation 
is a comforting factor which sustains the hope of being able to use 
short word length computers for numerical integration. 
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3 . 6 Effect of The Limited Accuracy of The Forcing Function 


In many cases of real time simulations, the inputs of the emulated 
dynamic system will be obtained from analog to digital converters. 

The accuracy of the corresponding forcing functions will therefore be 
limited by the number of bits given by the conversion.lt can easily be 
shown that the accumulation of round off errors degrading the results 
of an integration will, in such a case, be dominated by the effects of 
the rounding of the forcing term, provided the computer uses a mantis- 
sa somewhat longer than the word of the A/D. 

For simplicity, the function f(x n , u n ) will be assumed linear. It 
can therefore be written as 

1 ( vV"V« n . <3 - 56) 

where L n may be a function of n and where u n represent the forcing 
function. The equation corresponding to the evolution of the accumula- 
ted round off error (14) now becomes 

r n+l = r n + T f “ L n r n + ^ n ] + e n+1 (3-57) 


where 


Au n = u n - u n 


Equation (57) can be approximated by 


r' = - L_r„ + TAu„ + e 


n 


m n 


n 


: n+l 


( 3 - 58 ) 


( 3 - 59 ) 


The term e n+1 is obtained as before from (27) and (28). And, assuming 
an efficient use of partial double precision, it is found that 


(E^n+^l < Tu '[| f(x n’ u n>l ' L n| x tJ] + Tt }| u n| 

where ^ is the expectation of Au n . Also, 

var(e n+1 ) = T 2 cf’ 2 [ f 2 (. . . ) + L 2 x 2 ] + T 2 ^ 2 u 2 


(3-60) 


( 3 - 61 ) 
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where d 2 is the variance of Au n . Since the function f(. . . ) has been 
assumed linear, these last equations can be simplified. Thus, 

F^i+l 1 4 T f'| u n| + T 7l u n| (.3-63) 

and 

var(e n+1 ) = T 2 CT* 2 [ 2L n x n ' 2L n x n u n + u nl + n 

(3-64) 


The expectation and variance of the term Au n viewed as a random 
variable are known since u n is given by an A/D. They are, assuming 
that the A/D gives K bits with symmetric rounding, 

^ = 0 C3-65a) 

l j 2 = 2~ 2K ( 3-65b) 

12 

If the computer performing the integration uses symmetric rounding 
and has a mantissa a few bits longer than the word of the A/D, the 
first terms in (63) and (64) will be negligible compared to the second 
ones, so that then 


E(e n+ i) = 0 ( 3- 66a) 

var(e n+1 ) = T 2 # 2 u 2 (3-66b) 

These last expressions combined with (59) show that the accumulated 
round off error at the output of the integrator will depend on the 
rounding of the input only. Asa consequence, if for example the input 
(or forcing term) is given by a 8-bit A/D, a computer with a 12-bit 
mantissa will perform as well as a computer with a 40-bit mantissa. 

Experimentation on this point has been conducted following a 
similar procedure as for the preceeding simulations. The results of 
a first integration performed with a 23 -bit mantissa were compared to 
those obtained in a second inte gration performed with a mantissa 
reduced to 9 bits by symmetric rounding. The same simulation has 
been then repeated with, for the second integration, a mantissa reduced 
to 12 bits by symmetric rounding. The average value and standard 
deviation of the difference are given in the following figures. 
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X' - -UX t SINUJiTIHE ) . U-.3H , T*l/128 

Fig. 3-17. INPUT ROUNDED TO 9 SIT MANTISSA F. P. N. 
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Fig. 3-19. INPUT ROUNDED TO A 9 BIT MANTISSA F. P. N. 
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■ x' • -i*x ♦ siN<umnE) , u*.3i4 , t-i/32 
Fig. 3-20.' ■ INPUT ROUNDED TO A 9 BIT MANTISSA F . P. N. 
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X' • H-L*x + SINCU*TIME) , U* .314 , T-1'128 
Fig. 3-21. INPUT ROUNDED TO A 9 BIT MANTISSA F. P. N. 
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STANDARD DEVIATION OF ACCUfl. ROUND OFF ERROR 



.Pig- ,.3 t 22. INPUT ROUNDED TO ft 9 BIT NANTISSft F. P. N. 


STANDARD DEVIATION OF ACCUfl, ROUND OFF ERROR 


«.eoei 


(SYMMETRIC ROUNDING - 12 BIT MANTISSA ) 


3 . 7 Integration of a System of Differential Equations 

As mentioned in the introduction, integrating a system of simul- 
taneous differential equations can be done with parallel operations by 
assigning a separate integrator to each individual equation. After each 
step of integration the results of those independent computations will 
be exchanged among the integrators for the updating of the various 
forcing terms. As a consequence, round off errors will propagate 
from differential equation to differential equation and may become very 
large. To study their accumulation mathematically will in many cases 
be hopeless, especially if the equations are non-linear. Then, simula- 
tion will be the only way to obtain an evaluation of their effect . 
However, the observations made in the preceeding sections apply to 
integrating systems of equations as well as single equations. Therefore 
it can be said that if partial double precision and symmetric rounding 
are used, and if the coefficients L in the equations are not all very 
small, the accumulated round off error will remain small. 


Example 

The second order differential equation 

x" + 2^w c x' + w^x = u (3-67) 

has been simulated as a system of two simultaneous equations 

x' 

y' 

where the forcing term u was a sinusoid of unit amplitude. As for the 
previous experimentations the integration has been performed twice : 
once using the full length of the mantissa (23 bits), then reducing the 
mantissa to 9 bits by symmetric rounding. The average value and the 
standard deviation of the corresponding accumulated round off error 
has been calculated and is shown in the following figures. 


= y 


“ w c x 


2/Cw c y 


+ u 


(3-68) 
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Fig. 3-24. Simulation corresponding to the example on page 36. 
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X' • V _ - -• •- 

V' -.2*Y + SIN(UXTIME) , U-.314 


Fig. 3-25. Simulation of a system of second order. 


STANDARD DEVIATION OF ACCUPh ROUND OFF ERROR 



V' - -.1*X - .2*V ♦ SINCU*TIP1E) . U«.3H 

Fig. 3-26. Simulation of a system of second order. 
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Fig. 3r-27. 



Simulation of a second order system. 

STANDARD DEVIATION OF ACCUM. ROUND OFF ERROR 

(SYPIPIETRIC ROUNDING) 


| *.001 



3.0ft- 



TIP1E (SEC "> 

X' ■ Y 

Y' • - X - .5*X + SINIMTIHE) , U-.314 



Fig. 3-28. Simulation of a second order system. 
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3.8 Conclusions 


The mathematical approach to the problem of the accumulation of 
round off errors in numerical integration leads to two general observa- 
tions. First, all the parameters of the specific simulation under 
investigation have to be considered; general rules and formulas about 
the evolution of the accumulated round off error can therefore be 
established only for a limited range of well defined problems. Second, 
in many cases this theoretical approach will be hopeless (especially 
when non-linear equations or large systems of simultaneous equations 
are involved) because it calls for solving differential equations which 
will be at least as complex as those defining the original problem; 
comparative simulation will then be the only way to obtain useful 
information on this question of round off accumulation. 

Regarding the specific question of using short word computers for 
numerical integration, this research has shown the gain to be realized 
by a partial use of double precision, or partial double precision in the 
sense of Henrici. Indeed, if the ratio between the length of the double 
precision mantissa and the length of the single precision mantissa is 
large enough, the problem of the accumulation of round off errors will 
be drastically changed. In the worst case the accumulated round off 
error at any point of the integration will become independent of the 
chosen stepsize of integration; in the best case it will slowly decrease 
with it. This fact certainly opens the door to using microcomputers 
working with very short' sampling periods to perform real time 
simulation. 

At the same time the effects of partial double precision were ; 
studied, those of using simple truncation or symmetric rounding were 
also analyzed. Because of the large difference in magnitude between 
the expected value of the accumulated round off error and its standard 
deviation, symmetric rounding (which results in making the expected 
value go to zero) becomes a necessity with short word computers. 

It has been shown that this choice combined with partial double preci- 
sion will result in very small accumulated round off errors (in many 
instances smaller than the least significant bit) when the coefficient 
of the dependent variable (L) is not too small compared to the ampli- 
tude of the input signal. For this particular analysis the input signal 
was assumed exactly known (with an infinite accuracy). 
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A separate look has been given to the question of the accumulation 
of round off errors when the input signal to the emulated dynamic 
system is given by an analog to digital converter. It has been found 
that then a computer with a mantissa slightly longer than the converter 
word will perform as well as a computer with a much longer mantissa, 
provided symmetric rounding and partial double precision are used. 

The consequence of this observation is again that microcomputers 
will be able to handle simulation as efficiently as much larger and 
more expensive units when the problem of emulating a particular 
system requires analog to digital converters. 
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SECTION 4 


PREPS ARCHITECTURE AND IMPLEMENTATION 
4.1 Introduction 

This section is a summary of a one year effort in studying and 
developing a multimicrocomputer system to be used in parallel simu- 
lations. Emphasis was placed on the practicality of reconfigurability, 
and the compatability of software and hardware. A demonstrative 
system was built during this year and is described herein. 

The system design chosen to meet the objectives is reminiscent of 
analog computer architecture and uses special computer modules-- 
working in parallel-- to allow a simple definite assignment procedure for 
various parts of the simulation. 

The special representation of the simulated system- which led to 
the hardware organization- is described first, followed by a complete 
description of the hardware and software. 

This project is based on the premise that a multimicroprocessor 
simulator has potential for being practical if the required hardware- 
software combination can be made reconfigurable in a practical manner. 
A multimicroprocessor system is envisioned as consisting of a large 
number of interconnected microprocessors and a minicomputer. The 
rapid technological developments and improvements in LSI microproces- 
sors and arithmetic support chips make a special purpose, reconfigur- 
able, parallel processor dedicated to real time flight simulation an at- 
tractive alternative to a standard, large computer complex. It has been 
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clear for some time that increases in processing speed through techno- 
logical advances have almost reached an upper limit. Thus, attention 
has been focused on architectures which exploit parallel processing of 
data. The advantages are quite significant: 

Optimal (minimal) cycle time reduces the phase shift intro- 
duced into the process being simulated. 

Short cycle time decreases truncation error. 

Easy way to expand the system is provided by adding inde- 
pendent computing modules. 

Better reliability achieved by the modular nature of the 
design, also providing ease of maintenance. 

The system to be described suggestes two levels of reconfigura- 
bility. Flexible software modules provide the user an initial command 
language similar to analog computer program diagrams. These modules 
are designed to simplify the job of generating the simulated system, by 
providing blocks of programs which occur frequently in a particular 
form. Simple examples are integrators, weighted sums, non-linear 
functions, etc. These blocks, although fixed in regular use, can be 
redefined easily, or others can be added as needed to the system 
library. Integrations, for example, can be switched from Euler method 
to Adams-Bashforth method or any other desired one. 

Sophisticated arithmetic units connected by a common bus to a 
master computer comprise the hardware. Reconfigurability is achieved 
by independent operation of each unit and by the asynchronous mode in 
which they interchange data. Finally, man/machine programs supply 
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the relations between software and hardware under specifications given 
by the user and under optimization algorithms that reconfigure the 
system to give best results as to speed and performance. 

4.2 System Software 

A state variable description of the dynamical (possibly non-linear 
and time varying) system is the basis for the block diagram representa- 
tion of the simulation system. Independent parts of the differential 
equations can be evaluated simultaneously, thereby achieving paral- 
lelism. For example, consider the simple system given by 

XI = X2 - K4*X1 + K2*X3 (4-1) 

X2 = X3 + K1*R + K4*X1 

X3 = 2*X2 - K2*X3 + K2*R 

with initial conditions 

X1(0) = X10 (4-2) 

X2(0) = X20 
X3(0) = X30 

This system can be represented in block diagram of the kind shown in 
Figure 4-1. This structure is similar to that suggested for direct 
execution processors by Korn [3] but is herein used for programming a 
multimicroprocessor system. This necessitates a "library" which in- 
cludes multiplications, divisions and integrations as blocks of programs. 
We can see that the terms K2*X3, K2*R, K4*X1 can be evaluated once 
(per cycle) although used several times in the system of equations. 
The block diagram representation eliminates multiple evaluations by its 
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Fig. 4-1. Block Diagram Command Language 
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nature. The terms K4*X1, K2*X3, K1*R, K2*R can all be in process 
simultaneously and this is where the multimicroprocessor system has 
advantages over sequential processing units. 

The library blocks are written in the microprocessor's assembly 
code or preferably microcoded and stored in local PROMs. Assignment 
of blocks to a processor is done by loading the corresponding command 
codes into the local RAMs. There they appear as 


MULT 

A1 

XI 

K4 

SUB 

A2 

X2 

A1 


where MULT and SUB are command codes, A1 and A2 are labels of 
results and the other are operands. Command codes can be library 
functions or input-output definitions. I/O codes are carried out by 
special purpose hardware interfaces hooked to the system bus. 
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A task list for the example is shown in Table 4-1. This list 
indicates the fact that tasks may need results from other tasks as their 
operands. 


Table 4-1 . 
Task List 


A„ = 


K 4* X 1 


A n = 


A 1 + A 2 


A 2 = K 1 

* R 

A 9 

= A S + X 3 

A 3 = K 2 

* y 
X 3 

A 10 

00 

< 

1 

< 

II 

C\J 

II 

< 

* R 

A 11 

A 5 + A 10 

A - 2 * 
A 5 c 

X 2 

X 1 

= INT (A 7 ) 

A 6 = A 3 

' A 1 

X 2 

= INT ( A g ) 

> 

II 

> 

cr> 

X 2 

X 3 

= INT (A 11 


It is noted again that the "library" of tasks was chosen just for 
the demonstration, and more complicated functions may be programmed 
by the user. At the preceding step the list is broken down into several 
parts, and each of the available computers gets some of the tasks to 
be carried out. A timing diagram of a process (given by the example) 
is shown in Figure 4-2. The time needed for a completion of a task 
depends on the hardware speed, and is discussed later. Each library 
function has an associated execution time which can be measured experi- 
mentally when created. Those periods of time are used by an optimization 
algorithm, which shares the tasks between the computers (before the 
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beginning of the simulation process), thus minimizing waiting loops 
which may occur when a task in process needs the result from one that 
has not been processed yet. In the example shown no such situation 
occurs, and the simulation program is processed efficiently. 

4.3 System Hardware 

As viewed from the block diagram representation there are two ways 
to achieve parallelism: 

1. Simultaneous processing of tasks by the available microcomputers. 

2. Increasing efficiency in data flow between modules. 

The first point indicates that a given simulation system should be analyzed 
first and broken into subprocesses in which tasks do not depend on 
each other's results, thus processed simultaneously. The second point 
implies a special hardware architecutre. If we assume that several tasks 
may need the same piece of information, an inefficiency may occur when 
the common bus is occupied more than once to transfer the same 
information. An interrupt system and preanalysis of the given simulation 
can solve this problem, based on a "write only" bus architecture. This 
system is conceptually identical to a shared memory system where all 
computers use a common memory to read and write. The difference is 
that instead of one memory module we have an image of it in each local 
memory. The advantage of the image idea is that although the need 
for a piece of data arises asynchronously in the computers, which might 
lead to several transfers of the same data from the memory, the writing 
occurs only once and the data is distributed in parallel to the computers 
that need it. 
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Each of the computers is pre-programmed to detect the appearance 
of an operand needed by one of its tasks on the common bus. The 
interrupt system, controlled by the master computer, gives a "bus 
granted" not to processors that need an input but rather to processors 
that have intermediate results to output, and are idle until those results 
are accepted by the other processors. This way several modules can 
be interrupted at the same time, and perform input operations from the 
bus. The optimization provides a way to organize the tasks performed 
by each computer, so that their operands are available in the local 
memory when needed. The suggested organization that meets the 
above description is shown in Figure 4-3. 

The master computer performs man/machine communications 
before the simulation begins. It is used as an interrupt controller 
in the real time run. The number of slave processors is limited by hard- 
ware restrictions only. 

Two different buses connect the computers to each other: 

1. DMA bus. 

2. System bus. 

The first one is used to load the list of tasks into the local memories dur- 
ing system generation. It also provides access to the internal parts 
of each module, so the user can check registers, memory bytes and 
interface status. If the system library is needed to be very flexible, 
it can be written and held on disc and loaded by DMA into local RAMs 
for each simualtion. 
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Fig. 4-3. System Organization 




The system bus interconnects the computers for transfer of 
intermediate results, data and status. It includes a 32-bit data path 
and an 8-bit control path. 

Ultimately, the slave computers will be realized with bipolar, micro- 
programmable bit slice microprocessors because of the associated short 
cycle times and variable word length. To reduce costs and ease construction 
of a feasibility experiment, the current version combines the utility of 
INTEL-8085 microprocessors with the speed of Advanced Micro Devices 
AMD9511 ALUs. This arithmetic unit is stack oriented and can perform 
80-|Jsec multiplications and divisions, 180-psec additions and subtractions 
as well as several trigonometric and logarithmic functions, all done in 
32-bit floating point arithmetic (24 bit mantissa, 6 bit exponent, 2 
bits for signs). 

As shown in Figure 4-4, each computer is comprised of two main 
parts: 

1. Arithmetic logic. 

2. Communication logic. 

The 8085 microprocessor and AMD9511 provide the number-crunching 
capabilities. The AMD9511's data path is - 8 bits wide, perfectly suited 
to the internal bus structure and is treated as an interface port. 

Its activation is done by loading the two operands and the opcode. 

Its busy line is used to check completion of calculations which, in 
turn, interrupts the microprocessor. Status bits indicate overflow, 
underflow, negative argument, division zero by zero and argument too large. 
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Fig. 4-4. Computing Module 
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The relatively long time needed by the ALU for calculations is used 
by the processor to finish the overhead operations like communication 
with other processors or memory operations. 

Upon completion of a task, the results are transferred to the output 
latches. These are (32 bits wide) tristate flip flops, controlled by the 
communication logic which, in turn, is connected to the master interrupt 
controller. The microprocessor goes into a waiting loop if previous 
results were not taken by the system from the output latches. 

The DMA components include gates to the common bus and 
interrupt logic. A DMA request by the master computer generates 
an interrupt in the slave processors and a jump to the monitor. 

This monitor allows major registers and pointers in memory to be 
examined or changed by the user. The registers are restored and 
the simulation programs take over upon a start command by the 
user. 

Communication logic handles address and status bits used by the 
interrupt system. It consists of a one word comparator (6 bits) and 
programmed Ik word comparator. Upon completion of a task, the cor- 
responding task number is loaded by the local CPU into one side of 
the first comparator. The comparator is activated when the same 
number appears on the system bus--sent by the master computer which searches 
for ready tasks. The output gates are opened upon equality, and data 
flows to the system data lines. Also, a status bit (called s^) goes low, 
indicating to the master computer, as well as to the other computers, that 
data is ready on the bus. 
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A local operand list created in advance for each computer when 
tasks are assigned is used to generate interrupts for read cycles. 

This list consists of task numbers whose results are used by the pro- 
cessor during the simulation cycles. The list is loaded as ones and 
zeros in the corresponding addresses of a 1 bit wide Ik RAM used 
as comparator. Appearance of one of these addresses on the common 
bus along with low level status bit (s^) causes an interruption in 
the local process and performance of input operation. Input of data 
by a slave computer is done as follows: 

1. Set a status bit (si) to indicate input operation. 

2. Read data (8 bits at a time four times). 

3. Reset status. 

The data source computer checks si reset before releasing the bus, 
otherwise an input error occurs in the reception. Pre-programming 
of the Ik RAM is done by DMA operation during system generation. 

The master computer consists of minicomputer and sophisticated 
interface to the system bus. This interface is actually a microcomputer 
which handles the DMA operations and the interrupt control. Four 8 
bit data ports are dedicated to communication with the minicomputer, 2 
ports are used for DMA transfers, 8 ports are occupied by the 
system bus and 3 ports are used for general control information. 

Also, the interface microcomputer functions as a programmable system 
clock which generates the basic cycle time and counts the true time 
needed for the multiprocessor system to complete the tasks. 

The data base of the interface computer includes an address list 
of all tasks to be performed. State variables appear first in this list. 
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The addresses are sent to the system bus during the cycles, providing 
bus grants, according to the following algorithm: 


1 . 

Wait for beginning of cycle signal 

2. 

Reset task counter 

3. 

Ouput next address in list 

4. 

If S2 = 1 

go to 3 

5. 

If SI = 0 

go to 5 

6. 

If SI = 1 

go to 6 

7. 

If end of 

list go to 1 

8. 

Go to 3. 



Step 4 is used as feed back to check bus request. Steps 5, 6 
indicate the reception of the data by other modules. The algorithm by 
which the task list is searched can be changed easily from linear to any 
other desired priority schedule. Finally, the independent nature of the 
computers may lead to a different concept of implementation: instead of 

one cycle time for the whole system we can assign a single task to each 
computer and let it run freely. This demands much larger hardware 
support, but may reduce discretization error and increase speed signifi- 
cantly. This idea necessitates some changes in the control programs, 
but the hardware is completely suitable. 

Man/machine communicatiosn are performed by the minicomputer. 
This includes a high level simulation language, an optimization algorithm 
to assign tasks to available computers and utility programs such as a 
file manager, DMA driver, etc. A simulation language has not been 
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developed yet, although the block diagram approach previously men- 
tioned seems attractive. The assignment policy of tasks to different 
processors has a direct influence on speed and performance. The ideal 
solution may cause an equal share to be given to each one, and also 
eliminate idle cycles when one processor waits for results from another 
processor which has not yeat completed its job. Three algorithms were 
considered : 

1. Critical path determination. 

2. Trees. 

3. "Fill up" tables. 

The first one is based on the fact that each simulation system has 
a limit of parallelism. This is determined by the longest (most time 
consuming) path of tasks that feed each other with results, and must 
therefore be evaluated sequentially. Finding the critical path helps to 
determine the minimal cycle time. The rest of the tasks must now be 
examined and shared between the rest of the processors. This adds 
complexity to the algorithm because it does not provide a way to con- 
tinue the sharing procedure. 

The second one suggests building trees of connected tasks, with 
weights on the branches, relative to execution times. Again, there is 
no simple way to determine the relations between the number of avail- 
able processors and the parts of the trees, especially for tasks whose 
results appear several times in the calculations. 

The third one seems the clearest and simplest to implement, al- 
though run time may be longer. The data base consists of the tasks 
determined earlier. In addition, there is a work list which contains 
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initial conditions and a library. The library relates experimental exe- 
cution time to each function. Each processor is represented by a file. 
A time vector describes the current situation of each processor. 

The algorithm assigns the next task to the file whose cor- 
responding element in the time vector is minimal, in order to keep time 
values in the vector as close to each other as possible. The tasks are 
chosen from the original list if their operands appear in the work list. 
A task which is being assigned to a file is also added to the work list 
indicating that at this stage its calculation is complete. This algorithm 
does not always determine the optimal task share because of the random 
way they are chosen from the original list. The timing diagram in 
Figure 4-2 is a result of using the described algorithm. Each task 

being processed has its operands ready before the execution, and no 
need for waiting loops arises. 

4.4 Summary 

An experimental, reconfigurable multimicrocomputer system for real 
time simulations has been described. Its features may be further 
investigated, using the demonstrative system that was built. An effort 
should be continued to improve software and hardware: high level 

simulation language must be developed using the block diagram repre- 
sentation and modular l/o interfaces must be designed, based on the 
existing bus structure and operation. 

The current system is constructed of three slave microcomputers 
and a controller computer connected to HP 2100 minicomputer. Each 
computer is a modular unit, built on one board and connected to S100 
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mother board (15 slots). Cost of hardware is less than $2400, includ- 
ing the computers, chassis and power supply. 
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